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Abstract 

Understanding cnltnral phenomena on Social Networks (SNs) and exploit¬ 
ing the implicit knowledge abont their members is attracting the interest 
of different research commnnities both from the academic and the bnsiness 
side. The commnnity of complexity science is devoting significant efforts to 
define laws, models, and theories, which, based on acqnired knowledge, are 
able to predict fntnre observations (e.g. snccess of a prodnct). In the mean 
time, the semantic web commnnity aims at engineering a new generation 
of advanced services by defining constrncts, models and methods, adding a 
semantic layer to SNs. In this context, a leapfrog is expected to come from 
a hybrid approach merging the disciplines above. Along this line, this work 
focnses on the propagation of individnal interests in social networks. The 
proposed framework consists of the following main components: a method 
to gather information abont the members of the social networks; methods 
to perform some semantic analysis of the Domain of Interest; a procednre 
to infer members’ interests; and an interests evolntion theory to predict how 
the interests propagate in the network. As a resnlt, one achieves an analytic 
tool to measnre individnal featnres, snch as members’ snsceptibilities and 
anthorities. Althongh the approach applies to any type of social network, 
here it is has been tested against the computer science research community. 
The DBLP (Digital Bibliography and Library Project) database has been 
elected as test-case since it provides the most comprehensive list of scientific 
production in this field. 
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Introduction 

Social networking platforms (SNPs) collect a huge amount of information 
(and hence implicit knowledge) about their members and different domains of 
interest. Such knowledge concerns interests, friends, best practices, activities 
and other facts of life. General purpose SNPs (e.g., Facebook, Twitter) con¬ 
tain information on several domains (e.g., music, movies, literature, travels) 
whereas domain-specihc SNPs, e.g., anobii and Linkedin collect information 
on specihc topics (e.g., books for anobii and job and careers for Linkedin). 

According to McKinsey industry report of 2011 [1], the total volume of 
worldwide dispersed data is increasing at a rate of about 50% per year, that 
is around a 40-times growth in ten years. Data storage is becoming almost 
free since hardware devices are almost inexpensive and SNP companies gain 
added value by gathering data about members [2]. Management and analysis 
of big data involved in social networks are among the most effective activities 
in the scope of Data Science [2]. 

Organizing and managing information conveyed by SNPs in order to ex¬ 
tract knowledge about their members is leading the market to a new gener¬ 
ation of services focused on specihc users’ needs. There is a big opportunity 
for a paradigm shift from decisions based on ”gut feelings” to decisions based 
on data analysis. Advanced applications exploiting social network knowledge 
can generate value in different sectors, such as, security, politics, business, 
and ’’social good” [3]. 

In this paper we study temporal evolution of people’s interests in social 
networks. The objective is to understand basic mechanisms and to estimate 
some individual features such as susceptibility and authority; that is mea¬ 
suring the tendency of a person to be inhuenced by her/his connections and 
her/his tendency to inhuence others. Estimation of these human characteris¬ 
tics can be used, in the long run, as a basis for the development of advanced 
marketing services targeted to specihc individuals. 

A social network (SN) consists of a community of ’’members” linked to¬ 
gether with some kind of relationships (e.g., friendship, coauthorship, co¬ 
working). A SN is a virtual artifact originated from human activities. De¬ 
veloping a service leveraging on SN knowledge requires a hybrid approach 
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based on both engineering and natural science techniques and methodolo¬ 
gies [4], From this perspective, we may study the temporal evolution of 
people’s interests as a dynamic phenomenon arising in an anthropic system. 
Our hypothesis is that this phenomenon results from the combined action 
of several factors: people connections, general trends, pre-existing interests 
and both the attitudes of people to be influenced by or to influence others. 
Furthermore, we deem that, given an application domain, temporal evolution 
of interests depends on the topics, since people can be susceptible to some 
specihc information more than to others: e.g., American people are usually 
interested on the Super Bowl rather than Europeans; whereas it is the other 
way around for the hnal of the Champions League. 

The interest propagation phenomenon in social networks has been already 
studied by different disciplines [5] through different approaches: data mining, 
complexity science, semantic, and social science. 

In [6] [7], the authors propose a data mining approach to estimate the 
propagation of events (e.g. threads) and the identihcation of influential mem¬ 
bers. Most of the efforts in the data mining community have been devoted to 
dehne progressive models. In such models, once a node (member) becomes 
active (interested in a topic), it remains active. The most important prop¬ 
agation models are the Independent Cascade Model (ICM) and the Linear 
Threshold Model (LTM). Both of the previous models were first introduced 
in [8]. The key characteristic of ICM is that diffusion events along every arc 
in the social network are mutually independent; while the key characteris¬ 
tic of LTM is that members change their behaviour if they are exposed to 
multiple independent sources. Another data mining approach was presented 
in [9]. Here the authors propose models and algorithms to learn influence 
probabilities parameters from a ’’social graph” and a log of actions by the 
users. 

Complexity science includes the study of complex networks [10] [11]. 
Among the phenomena treated by this discipline, epidemics [12, 5] studies 
the spread of viral processes in networks. The complexity science is mainly 
focusing on human infectious diseases and software malware spread. How¬ 
ever there is a growing interest in studying topics diffusion in social networks 
[13], social dynamics[14, 15] or even non consensus dynamics [16]. 

Merging the topological and semantic analysis of social networks repre¬ 
sents a new and potentially fruitful research held which is providing promising 
results [17] [18] [19] [20] [21]. Our work shares the use of a semantic concep¬ 
tual representation of a Domain of Interest [22] in the social network context 
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with the formers. 

A social science approach is presented in [23]. There, the authors describe 
an experiment performed on Facebook to estimate influential and susceptible 
members of social networks with respect to some social features, such as age 
and sex. Another interesting issue considered by the social science commu¬ 
nity is homophily (i.e., the tendency for individuals to choose friends with 
similar tastes and preferences) [24] [25]. Our work does not deal with such 
issues. 

In the context of social science, the concept of ”meme” [26] is acquiring a 
growing attention representing the elementary brick for the evolution of cul¬ 
ture and behavior in the human communities. As such the meme is different 
from our concept of interest; however some authors [27] have employed the 
term as synonym of our concept of interest. 

We treat the SN as a physical system and we model interests dynamics as 
a diffusion process. Like a physical system, a thermodynamic equilibrium is 
reached after a certain time period when no heat source is applied. Similarly 
in SNs, arising of new topics can be considered as a heat source that hinders 
the equilibrium of interests thus preventing all people to be interested in the 
same topics. 

Our approach is based on the analysis of social network’s connections and 
the temporal evolution of the interests of its members. We have dehned a 
general Markov evolution process and we have tested it on a co-authorships 
network in computer science. Although our paradigm is very general, we have 
deeply analyzed the DBLP^ computer science bibliography that provides an 
exhaustive list of papers in computer science from the onset of the discipline 
to present. In this application, we infer people interests from the titles of the 
documents they authored by means of natural language processing (NLP) 
|28]. 

The main contribution of this work is a framework consisting of four main 
building blocks: 

• A modelling approach for social networks, to give an explicit specihca- 
tion of SN knowledge concerning people, their relationships and their 
interests. 


^DBLP: Digital Bibliography & Library Project. http://www.informatik.uni- 
trier.de/ ley/db/ 
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• A diffusion theory, to describe the interest propagation phenomena and 
to make predictions about them. 

• A method to measure individual features (i.e., people susceptibility and 
authority). 

• A software application, to assess the theory and to measure individual 
features. 

In the following the above-mentioned building blocks are presented along 
with the outcomes of the analysis of the DBLP dataset. In particular. Sec¬ 
tion II presents the modelling approach to represent the implicit knowledge 
of social networks. Section III describes the interest propagation theory. Sec¬ 
tion IV presents the case study and Section V describes some validations of 
the theory. Finally, Section VI discusses the hndings for the case study. 

Knowledge Representation of Social Networks 

In this Section we present our approach to unveil tacit knowledge encom¬ 
passed in social networks and to turn it into explicit and formal knowledge 

|29]. 

We start from an abstract representation of a SN that results from the set 
of relationships among the members of the real social network. The second 
problem is to provide a reliable representation of the semantics of the Domain 
of Interest [22]. Finally, we need a means to represent connections between 
these two systems. The former elements provide an abstract representation 
of the system we intend to deal with at a given time. However the knowledge 
contained in the real system is not limited to the instantaneous conhguration 
but it results also from the chronology of the events. 

As anticipated in the Introduction and shown in pictorially represented 
in Fig. 1, social networking platforms (SNPs) support the activities of a 
real social network (SN). The latter can be represented as a semantic social 
network (SSN) that, in turn, consists of a social network (SN), a semantic 
network (SeN) [30], and a weighted interest graph (WIG) connecting them. 

Interests in Soeial networks 

A SN can be represented by a directed graph SoN = {H,F), where 
the set H of nodes {hi} represents the members of the social community 
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Figure 1: A pictorial representation of our approach to semantic social networks. The 
social networking platform, at the bottom, supports the activities of a real social network. 
The latter is represented by the semantic social network (the cloud) consisting of the social 
network (at left), a semantic network (at right), and the weighted links from the interest 
graph. 


H = {hi, h 2 , h\H\} and the set F of links fi^k represents relationships 

between members as ordered pairs F = /i, 2 , •••, /|f|}- 

Expressions of interest are events (e.g., hlling or changing a form prohle, 
posting, commenting or ’’liking” a post, pnblishing a paper) demonstrating 
a positive attention by a member to a prodnct. All possible prodncts form 
the Domain of Interest (Dol). It is worth mentioning that the term prodnct 
here is employed in its broad sense, referring not only to goods, bnt also to 
cnltnral events and scientihc prodncts snch as articles, books, movies, etc. 

The Semantic Network 

Conceptnal images of prodncts can be expressed in terms of a hnite nnm- 
ber of concepts belonging to a semantic network representing the Dol. A 
semantic network can be seen as a graph SeN = (A, R) where the set 
A = {Ai, A 2 ,..., A|a|} of nodes are concepts (logos) and R = {ri, r 2 ,..., r|ij|} 
are the links that represent semantic relationships of different types as snb- 
snmption, meronimy and similarity [31] between the different concepts. 
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Given the semantics structure, we further assume that there exists a set of 
elementary concepts, that we name ’’topics” C = {ci, C2,..., C|c|}, such that 
one can associate to each product (or its abstraction) a subset of topics. The 
identihcation of basic topics plays a fundamental role and is a critical issue 
treated by the ontology engineering discipline [32, 33, 34, 35, 36] involving 
both automatic techniques (such as natural language processing) and domain 
experts’ validation. We further discuss this point in the test case. 

The Semantic Profiling Process 

A Interest Graph {IG see Fig. 1) represents an abstraction of a com¬ 
munity of people together with their interests. It can be represented as a 
bipartite graph consisting of a set of nodes N partitioned in two groups, one 
representing people and the other set, C = {ci,C2, ...,C|c|}, representing the 
topics, and a set of relationships I representing the interests of people in 
topics. Consequently, IG = {H,C,I), where / = •••, h/|}) and 


ii = {hj, Ck) with hj G H and G C. (1) 

A Weighted Interest Graph WIG is an IG with weights assigned to the 
links between people and topics. Such links can represent, for instance, 
either the probability to be interested or the degree of interest in a topic. 
Consequently, WIG = {P,C,I,W{I)), where w{I) is a mapping from the 
set of relationships / to the [0, 1] range. 

w(I) : / ^ [ 0 , 1 ] ( 2 ) 

Fig. 2 shows a representation of a weighted interest graph as a weighted 
bipartite graph. 

We dehne semantic profiling the process of associating interests to the 
members of the SN, that is inferring links and their relative weights of the 
WIG. The set of interests characterizing a member hi is dehned as her/his 
semantic profile ShP 


Shi = {wk ■■ {hi,Ck) e 1} ( 3 ) 

where Ck ^ C, k G (0, \C\), hi E H and Wk = w{{hi, Ck))- 

In other words the semantic profile of a member is a subset of the interest 

graph. 

Given the basic set of interests c^, one possible choice to provide a member 
hi with a semantic prohle is to attribute a likelihood Lhfick). 
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Figure 2: A representation of a weighted interest graph as a bipartite graph 


The semantic profile of each member is not static and one needs to ac¬ 
count also for its temporal evolution: 


Shfit) = {ck : {hi,Ck) e I{t)} (4) 

where Ck G C{t), k G (0, |C|), hi G H{t) and Wkifi) = w{{hi,Ck)). 

By introducing the scalar product 

{Wh, ■ Wh^) = {Ck)whj (Cfc) , (5) 

Ck 

One may treat the space of semantic profiles as an euclidean space. One can, 
also, dehne the quadratic distance between two semantic prohles: cP{whi,Wh ) = 
{{whi — Wh fi)- The smaller is the distance the closer are the members’ in¬ 
terests. 

The Semantic Social Network 

A semantic social network SSN represents the relationships between 
members, the semantics of the Domain of Interest and the actual inter¬ 
ests of the community of members with their weights. From the mathe¬ 
matical point of view it can be formally written as a set of six entities: 
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SSN = {H, F, A, R, I, Wj) where H represents the set of members (hu¬ 
mans); F represents the relationships between members; A represents the 
set of concepts; (K) represents the set of semantic relationships; / represents 
the interest of people on topics; and Wj represents the degree of interests in 
topics of people. 

Semantic social networks are like living entities: they are born, grow, 
shrink and, hnally, die (close). Appearance of new nodes may represent both 
inclusion of new members and emergence of novel topics. Similarly, disap¬ 
pearances of nodes may mimic the cease of participation of people to the 
community or the obsolescence of topics. Moreover interests of members 
on topics may change their intensity during the time. To model the dy¬ 
namics of the latters, we define a dynamic semantic social network SSNt = 
{H{t),F{t),K{t),R{t)J{t),Wi{t)) 

Interest Propagation Dynamics 

In this Section we assess a model of interest propagation to predict the 
evolution of the interests in a semantic social network. It accounts for the 
structure of the social network and its evolution [37] without predicting it. 
Consequently, it has the objective to estimate the probability for a person hi 
to be interested in a topic Ck at a given time. 

The resulting equations of dynamics are based on the following four as¬ 
sumptions: 

• As a person, each member tends to keep her/his own beliefs. 

• Each member is partly influenced by others interacting with her/him 
(one to one interaction). 

• Each member is partly influenced by trends (one to all interaction). 

• The evolution mechanism is markovian. 

The evolution equations resulting from the above assumptions can be 
approximated for short time increments: 


(*^fc) At) [l 2:j(cfc) (cfc)] (c/j , t) -|- ■ ^ ^ 2:jj(cfc)'T/j.(c^, t)-|-2:j5(cfc)'Ts(cfc, t) 


hj&Nh. 


( 6 ) 
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The three addendums at the right hand side, respectively, model the per¬ 
sonal tendency of a person to keep interest in a topic c^, the influence of 
the neighbours and that of the environment. In particular, -|- At) 

represents the probability of person hi to be interested in the topic Ck at time 
t + At. Lhi{ck,t) represents the probability of person hi to be interested in 
the topic Ck at time t. Ls{ck, t) is the probability for the environment to pro¬ 
vide some information on interest Ck at time t. We refer to this quantity as 
the ’’source term”. Xi{ck) and Xij{ck) are parameters (to be experimentally 
determined) that characterise the different individuals. We do assume that 
when all neighbours share the same interests (i.e. their prohles) the interest 
prohle should not experience any variation, therefore: 

XiiCk) = Y] Xij{Ck) (7) 

I hjGNHi 

and similarly, when the single member prohle equals the trends source, no 
inhuence is expected; that is Xis = Xig. 

In the limit for At —)■ 0 the above discrete-time equations tend to heat-like 
equations: 


0 1 
rs, ^hi iS'ki [ ^ii^^k) (c/j, t)-|- : 

Ot \I\hi\ 


^ ^ '^hj (cfc, t) (cfc) 'Lg {cki t) 


hj&Nh. 


where Vij{ck) = Xij{ck)/At represent the rates of susceptibility per 

unit time. It should be noted that Xij{ck) depend on time increment; such a 
dependence has not been made explicit for the sake of brevity. 

The evolution equations (6) do not lead to consensus, that is, the L’s do 
not converge to a common value. However, the system becomes ergodic (in 
all its connected parts) when the susceptibilities from the environment are 
removed. In fact, there exists a suitable weighted average of the L’s that is 
a conserved quantity: 

Hck) 'Y^LhXck^t) ■ bi{ck)] (9) 

hi 

where the constant bi satishes the equilibrium equations: 


M^a;ji(cfc) ■ bj{ck) = bi{ck) ■ Xi{ck). 
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When the system is isolated all the L’s tend to L. However there are many 
reasons for which this condition is never reached. In fact the topology of the 
network is dynamic, the susceptibilities may also change during the time and 
the environmental influence is not negligible. 

When all the mutual susceptibilities Xij are equal, the conserved quantity 
of the eq. (9) acquires a pure topological form: 

'^hi ^hi{Ck,t) ■ Nh. 

SW ’ 

that is, authors influence the asymptotic ’’consensus prohle” according to 
their degrees. 

Key concepts in the interest propagation theory are the individual fea¬ 
tures, i.e., susceptibility and authority, characterizing a person with respect 
to a specihc Domain of Interest. 

According to Merriam-Webster^, susceptibility is dehned as the ’’state of 
being easily affected, influenced, or harmed by something”. Here, in partic¬ 
ular, there are three different parameters related to it: Xij{ck) and Xis{ck). 

Xij{ck) is a positive number representing the attitude of a member hi to 
be influenced by each of her or his neighbours hj with respect to the topic 

The Xi parameter measures the susceptibility of a member hi to her/his 
neighbours’ total solicitation with respect to the topic c^. It is given by the 
average of Xij over all j’s (as in eq. 7). 

Finally, Xis{ck) represents the attitude of a member to be influenced by 
the general trends (i.e., environment or trends susceptibility). 

According to Merriam-Webster, the authority is the ’’power to influence 
or command thought, opinion, or behavior”. We may introduce a* that 
measures individual authority as following: 

ai ^ Xji{ck) 

It is worth stressing that the a/s do measure a sort of ’’local” authority as 
the capability to influence the whole systems depends on the topology of the 
social network and its changes during time. The fe/s of Eq. 10 may represent 
a sort of global authority if the social network were static. 

^http: //www.merriam-Webster.com/ 


( 12 ) 



-^(c/c) 
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Interests Dynamics in the Research Community 

One of the possible applications of our general framework is analysis of 
publications in the research community. In this respect, according to our 
knowledge representation approach, we identify the members of the social 
network with the authors connected by the co-author relationships (repre¬ 
senting the edges). In principle other types of relationships, such as belonging 
to the same institution, participation to common projects and be accounted 
for a paper should be considered. However to handle such variety of links, one 
should resort to multiplex approach that has been intensively investigating 
during the last decade [38, 39] and is beyond the scope of this work. 

In principle, all the information contained in the whole set of commu¬ 
nications (papers, talks, interviews) form the corpus where to extract the 
semantic network. However access to all this information is impossible and, 
hence, an approximate conceptualization of the Domain of Interest is neces¬ 
sary. 

The expressions of interest can be of different types. The most signihcant 
is the publication of new scientihc products (e.g., paper, book), but there 
are others such as the citation of a work, the invitations to conferences, 
attendance to talks, seminars, conferences and other presentations, etc. All 
these events contribute to the semantic prohling of a member. 

One of the most difficult tasks for researchers in the held of social network 
analysis is to obtain a signihcant dataset to test new methods and software. 
In fact, despite the hardware to store information being inexpensive, there 
are privacy issues and business motivations that hinder the process of mak¬ 
ing these datasets open to the research community. Open data [40] provide 
a means to oppose this tendency. Their availability is crucial for the ad¬ 
vancement of research in data science. Moreover, since we are interested in 
temporal evolution, datasets need also to carry the chronological information, 
which is even harder to attain. 

Fortunately, existing repositories of information about scientihc research 
papers provide free and open source datasets. In particular, the DBLP 
dataset provides a comprehensive list of scientihc production in computer 
science. 

Computer Science Case Study 

The goal of the experimental evaluation is to test the theory (that is eq. 6) 
against a real case study represented by the computer science community. In 
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order to perform the analysis, we need to acquire the information about the 
topics dehning the scope of the computer science domain and the evolution 
dynamics of both the social relationships and the interests of the authors. In 
principle, the above information could be extracted from different sources, 
however the DBLP dataset provides both the information through a single 
XML document [35]. 

The analysis of the test case consists of the following steps: 

• papers selection, according to their type (e.g., journal, conference, book 
chapter) and year; 

• interests and topics identification; 

• papers indexing; 

• identification of social network topology and its temporal evolution; 

• semantic profiling of the members; 

• analysis of the trends; 

• assessment of model parameters. 

Papers Selection 

The DPLP database is an evolving entity. Results presented in this work 
refer to the dataset as published by november 2013, it consists of 2.360.780 
papers and 1.337.857 authors. The observation period has been limited to 
years from 1950 to 2012. In such a temporal range, the number of considered 
papers is 2.246.098 and the authors 1.337.195. In order to study the evolution 
of authors’ interests it is necessary to observe some change in their semantic 
prohle during time; therefore only authors that have published papers in, 
at least, two different years can be analysed. We have named those authors 
’’treatable”. It is worth noting that only 519.886 authors out of 1.337.195 are 
treatable as far as 2012. It is reasonable to image that this is mainly due to 
students that just publish one work and then leave the world of research for 
other activities. It should be noted that not-treatable authors are intrinsically 
untreatable, i.e., independently from the specihc capability of a suitable set 
of topics to index papers. 
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Identification of Interests and Topics 

As said all the communications form the potential corpus for the recon¬ 
struction of semantic network. However we have limited our analysis to the 
titles. There are other possible choices such as the abstract and or the in¬ 
troduction; however, it is worth noting that a lot of information contained 
in the papers (and specihcally in those sections) do not refer to their spe- 
cihc contents, but to the general state of the art in the held and, hence, the 
semantic analysis of the full text (or the abstract and introduction) could 
include spurious terms not related to the subject. Moreover, introductions 
were not available for all papers. Finally the analysis of millions of full pa¬ 
pers is extremely time consuming and may be not sustainable. For all the 
reasons above, limiting the analysis to the titles seems appropriate. 

We have used natural language processing techniques [28] to extract 
multi-lexemes from the corpus of titles. Then we have validated them by 
human processing devoted to remove general purpose lexemes that are not 
specihc of the computer science domain and to merge synonyms. The result¬ 
ing list of lexemes forms the set of basic topics {ck}, that we have used for the 
analysis. Recently, based on Latent Dirichlet Allocation, new methods have 
been employed for automated topic extraction [41]. These novel techniques 
will possibly improve the quality of the set. 

Figure 3 shows the evolution of the number of detected topics within a 
given year {nfit)). In the range 1950-2012, 7.632 topics have been identi- 
hed by using the TermExtractor web application [42]. This is a tool that 
allows extracting the shared terminology of a community from the available 
documents in a given domain. Figure 3 clearly shows that the held has ex¬ 
perienced an exponential proliferation of concepts and reached its maturity 
at the beginning of the XXI century. This is consistent with the common 
perception of the addicts. 

We have mentioned that the selection of the basic set of topics C = 
{ci, C 2 ,..., Cat} plays a crucial role to ’’tame” the Domain of Interest. It is 
worth stating that, in order the diffusion theory to work, the c's must form 
a ’’basis” for the algebra of interests. The most relevant relations among 
concepts (and hence among interests) are generalization (specialization) and 
similarity. From the algebraic point of view these represent inclusion rela¬ 
tionships. The two constraints we impose on the set C are ’’completeness” 
and ’’independence” respectively. 

A set of concepts will be named ’’complete” when each concept can be 
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seen as the union of a subset of basic concepts: 


Vc3{ii,i2,... : c = (13) 

this means that each interests is the combination of a set of the basic interests. 

On the other side a set of concepts will be named ’’independent” when 
each pair is disjoint, that is there does not exist a concept representing a 
common specialization of both; 


yci,Cj: Cincj = 0. (14) 

When eq. (14) holds the decomposition of eq. (13) is unique. 

We identify the basic topics with a subset of the multi-lexems. The 
quality of the results do strongly depend on the capability of the selected set 
of multi-lexemes to fulhll the required constraints. 

The former analysis refers to topics as autonomous entities, however they 
belong to a semantic network. The analysis of semantic structure of the 
Domain of Interest is beyond the objectives of the present work. 


Papers Indexing 

Once the set of topics is assessed, it is possible to attribute a subset 
of them to each scientihc product. Conversely, each topic can be given a 
frequency as the number of papers referring to it 


p{t) 


nc{t) 


(15) 


where v^Ck^t) is the number of the occurrences of the topic Ck and ndt) 
is the number of the topics up to a given year. 

Figure 3 shows the evolution of the average frequency of the topics during 
the time. Even if the semantic complexity of the held reached its mature state 
(no further topic proliferation), the production in the held is still growing. 


Topology and Evolution of the Soeial Network 

We have focused our research on the time period from 1950 to 2012. For 
each year, the nodes of the social network are given by the authors that have 
written papers by that year and the edges are given by the co-authorships. 
According to this assumption, the social network evolves incrementally with 
the time. At a given a year, we attribute a link to all authors that have pub¬ 
lished a product together by that time. We never remove links and, therefore. 
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Figure 3: Temporal evolution of the number of topics and their average frequency of 
publication as resulting from the analysis of the DBLP database by natural language 
processing. 


our social network shows increasing complexity. It is worth noting that in 
real life authors stop to publish (e.g., for retirement or job change) or give up 
their collaboration. However we have not taken into account such phenom¬ 
ena since we are limiting our work to the information available in the DBLP 
dataset. Moreover, all members are treated on the same ground regardless 
of their notoriety or scientihc production. The former approximation may 
incide on the quality of our results. 

Consistently with Barabasi-Albert [10], the observed distribution of au¬ 
thors’ collaborators is a power law with an exponent oscillating within the 
[2.5, 3.0] range. 

Semantic Profiling 

For each year in the observation period, a semantic prohle can be given to 
each author by means of the relative frequencies of ’’expressions of interest” 
(publications): 


^hi (Cfc, t) 




(16) 


where nhfickit) represents how many papers, written before the considered 
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year, are indexed by the topic c^. This function by dehnition spans the [0,1] 
range; the unitary value represents a total interest in the subject while a 
null value means no interest at all. These semantic prohles represent our 
estimates of the probabilities evolving according to eq. (6) that 

is, one can estimate the likelihood of an author hi to publish on a topic Ck 
through its share of interest: 


Lhi{Ck,t) ^hi{Ck,t) (17) 

Treatable authors (as dehned in d) are named ’’semantically treatable” 
once they acquire a not-trivial semantic prohle. Their number is affected by 
our capability to identify interests related to the domain and to index all 
papers. In principle, we are able to estimate both trends and neighbours 
susceptibilities of semantically treatable authors. However, the higher is the 
number of publications of an author the more precise is the estimation of 
susceptibility. 

Sewing together all the semantic prohles of the different members one 
achieves the interest graph. 

To reconstruct the time evolution of the social network, for each year, we 
have attributed a link to all pairs of members sharing a paper published by 
that year. We have not attributed weights to links depending neither on the 
age of the shared publications nor on their number or scientihc relevance. The 
former are two strong hypotheses that may be relaxed in future (ongoing) 
works. Human contacts that took place in a remote past may have ceased; 
moreover, a successful publication may stimulate further scientihc common 
activity more than a coarse one. 

Analysis of the Trends 

The analysis of trends in social networks has been exploited in diherent 
cases including the prediction of stock market [43]. Generally speaking one 
needs a source to evaluate statistics of diherent topics. In our case the popu¬ 
larity of a topic can be estimated by its relative frequency over all published 
papers: 


(Cfc)t) 


l^{Ck,t) 

Ec,^(CfcG) 


where vi^CkA) is the frequency of the topic Ck at time t. 


(18) 
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It can be regarded as the likelihood of a random person to be interested in 
the concept at time t. 

As can be seen from Figure 3, the number of topics has been saturated 
during the last ten years. The same thing applies to the relative relevance of 
the topics. 

One can estimate the entropy of the Domain of Interest by means of the 
Shannon Entropy of the topic frequencies: H(t) = — t) ■ t)]. 

It provides an index of dispersion for the relevance of the topics. Also entropy 
experiences saturation for larger time. 

Assessing the Interest Propagation Theory 

We have assumed that interests propagate according to a diffusion-like 
eq. (6); however, the model contains free parameters (xij) that need to be 
specihed. We have formulated three different hypotheses on susceptibility 
with increasing level of complexity that we have tested against the DBLP 
dataset. The parameters have been £t by the maximum likelihood outcomes. 

For the sake of simplicity (and to prevent possible overfitting), we have 
assumed that Xi, Xij, and Xis do not depend on the specific topic c^. This 
means that a member influences her/his neighbours with the same intensity 
regardless of the subject. 

In general, to estimate the susceptibility parameters, we have constructed 
the mean square differences between the predicted L's and the observed 
ones: 


(19) 

where the symbol S indicates the variation of a quantity from one year to the 
next. 


6^{ck, t) = ^(cfc, t + M) - ^{ck, t). (20) 

One performs the optimization using the as an object function, that is 
minimizing the deviation of prediction from observed values. 

Since the L’s represent likelihoods, they must be confined to the [0, 1] 
range. This implies that also the Xij and Xis belong to the same interval. 
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Therefore the feasible solutions of the optimisation process must respect these 
constraints: 


{ ^is ^ 0 

Xij > 0 ( 21 ) 

Xij + Xis<l 

The optimum values of the parameters are achieved analytically when the 
point at which the gradient of the vanishes corresponds to a feasible 
solution: 


( 22 ) 

When the analytical solution is unfeasible, we attribute to the parameters 
the closest value at boundary. 

The Uniform Environmental Influenee Hypothesis (HPi) 

The hrst hypothesis {HPi), that we have taken into account, states that 
all members have the same susceptibility to trends {xis = and are not 
influenced by neighbours {xij = 0). Accordingly, the eq. (6) simplihes as 
follows: 

Lhi{ck,t + Xso) ■ Lhi{ck,t) + Xso-Ls{ck,t) (23) 

hj&Nh^ 

where, as usual, Lh.{ckit) is the probability for a person A to be interested 
in the topic at the time t. Ls{ck,t) represents the prohle of media that 
convey the trends. 

Under this basic hypothesis, the only parameter to be estimated is Xso 
with the constraint 0 < < 1- By introducing the deviation of the semantic 

profile of each author from the environment 

Ki = [Ls{ck, t) - (cfc, t)] (24) 

the object function becomes the following: 

-^^hAck,t)y (25) 

tyhi,c^ 

where Sfhi{ck,t) is the variation of the share of interest of a member hi in 
the topic Cfc. Consequently, Xgo can be calculated analitically: 
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(26) 


SXgQ 


0 XgQ 


Ylt,hi,Ck A/i-J ' ^^hjjckit) 

y [A^i^ 

Z-^t^hi^cp^ \_ hi. 


The Uniform Environmental and Neighbors Influence Hypothesis 

The second hypothesis {HP 2 ), that we have taken into account, states 
that all people have the same susceptibility to trends {xis = Xg) and to the 
neighbours {xij = x). 

Accordingly, the eq. 6 becomes as follows: 


Lhflckfl+Nt) = {l-X-Xs)-Lhi{Ckfl) + TTri- X-LhACkfl)+Xs-Ls{Ckfl) 

\Nh' 


hj&Nh, 


By introducing the average profile of the neighbours 


(27) 




hj&Ni 


Nk 


(28) 


and the deviation of the semantic profile of each author from the neighbours: 


l^l = \Ll(c,,t)-U,(ct,t)]. (29) 

Then to estimate the parameters Xg and x, the becomes: 

[^-Kihkfl)+Xg-Nlflckfl)-5^hAckfl)Y (30) 

t,hi ,c^ 

The maximum likelihood equations (from general eq. 22) can be written 
in a compact form introducing the scalar product 

((/ -a)) = {{f ■ g))t = f{ck, t)g{ck, t) (31) 

Ck,t 

where ({/ ■ g)) is the scalar product of the eq.ne 5, while (■)t represents 
the temporal average. 

Then 


( (([A«]^)) {(A« . AJ,)>\ (x, \ _ (m,Xck,t) ■ AO)t ,0 2 , 

(.{(Aj'-AJ,)) ((|A),y)) t b('*&.('=^V)'AJ,))t 

where \nhi\ is the number of authors. 
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The Individual Environmental and Neighbors Influenee Hypothesis 

The third hypothesis [HP^), that we have taken into account, states that 
people have both individual susceptibility to trends {xis) and neighbours 

{Xij = Xi). 

Accordingly, the eq. 6 becomes as folllowing 


(cfc) tTAt) (1 Xi (cfc, t)d 



•E 


The object function becomes 


^i'Lhj (Cfc, t) -\-Xig'Eg t). 


(33) 


= E d ■ ^b*^*!*) (S'*) 

and the optimum solutions are given by the following equations: 


=B 

\XigJ 

(35) 

Table 1 presents a summary of the testing hypotheses of the interest 
propagation theory and the main numerical results. 

Numerical Results 

The results presented in Table 1 show that the quality of the £t improves 
with the complexity of the model behind the interest propagation theory. In 
fact, taking into account the number of degrees of freedom {dof) and the 
value of the x^/dof function (representing a good index for the method), 
HPs hts the dataset better than HP 2 and HPi. Optimizing resulted in 
some negative values of Xi and Xig. In such cases, they are considered null 
(case a in Table 1). Even if x^/dof increases, it is still better than that of 
HP 2 . These results support the validity of the interest propagation theory. 

In the next sub-section a detailed analysis of the HP^ hypothesis is presented. 

HP^ Detailed Results 

The HP 3 hypothesis is more complex than the others and deserves some 
further discussion. The analysis of the best fit equations (35) shows that there 
are 420290 cases where detA 7 ^ 0 and 11627 cases where detA = 0 . Hereby 
Xi and Xig will indicate the solutions of those equations when they exist. 


\{{K ■ A^)) (([A^J^)) J W ■ K))) 
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Table 1: A Summary overview of the different hypotheses. 


Hypothesis 

Free Parameters 

Estimated values 

X^ldof 

HP, 

^ij 0 

^is ^sO 

^ij 0 

= ^80 = 0.084 

4.606 * 10-6 

HP2 

'll 

Xij = X = 0.051 

Xis = Xs = 0.053 

4.576 * 10-6 

HPs 

^is 

x=0.087 

Xs = 0.059 

3.780 * 10-6 

HPsa 

Xij = Xi > 0 

Xis > 0 

7=0.093 

7, = 0.071 

3.920 * 10-6 


HPi. All people have the same susceptibility to trends and are not 
influenced by friends; 

HP 2 '. All people have the same susceptibility to trends and to neighbours. 
HP^: People have individual susceptibility to trends and to neighbours. 
HP^a'. People have individual susceptibility to trends and to neighbours. In 
case of negative values of Xi and Xis they are considered null. 

The cire normalized by the degrees of freedom {dof) for comparison. 


Unfortunately in some cases those solutions are not feasible. In the following 
the different cases are analyzed in details and their statistical frequencies are 
presented in Table 2. 

The case I corresponds to the most common situation where the solu¬ 
tions Xi and Xis of the eq. (35) are feasible and we can estimate members’ 
susceptibilities to both their neighbours and the trends. 

Case II represents the situation in which the solutions of the eq. 35 are 
positive but the constraint f* + < 1 is violated and we have attributed 

the closest values at the boundaries by rescaling the resulting parameters: 

J * Xi+Xis 

L Xi+Xis ■ 

In case III, Xi is negative while Xis is feasible. From the point of view 
of the diffusion equation this means that the member is not influenced by 
the neighbours but by the trends only and we have to attribute a null value 
to Xi- However, there can be other interpretations. A negative value of Xi 
indicates that the change in the profile has a component directed against 
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A^. From the geometrical point of view (see figure 4), this corresponds to 
the trigonometric inequality: cos (a) < cos{/3) ■ 005 ( 7 ). 

There can exist members that tend to publish in different subjects with 
respect to their neighbours. This can take place when the members want to 
exhibit some kind of independency with respect to other people in the group 
they belong to. This type of behaviour is not included in our present model. 
Accounting for such an effect would result in a non diffusive dynamics. 

Negative values may also be a spurious consequence of the overlapping 
of the different multi-lexemes representing the topics. In fact, if there are 
similarities or synonyms in the basic set of topics, small angles a between 
members and their neighbours can appear close to vr. Again we stress that 
the quality of the semantic analysis plays the most important role. 


0 t+At 



Figure 4: Pictorial representation of one step of the interest diffusion. Each point repre¬ 
sents a possible profile. The black filled dots represent the profile for a member at two 
subsequent times {t and t + At); the red empty dot represents the average profile of her/his 
neighbours; and the blue dashed dot represents the profile of the total trends source. The 
solution of the eq. 32 corresponds to the linear combination of and closest to 6^. 
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In case IV Xi is positive while Xis is negative. As in the previous case 
the interpretation from the diffusion equation is that the member is not 
influenced by the trends. However also in this case there are alternative 
interpretations. The most important effect that is neglected in the theory 
is innovation. Some members introduce new topics. This results into some 
negative cos(/9). Current limits of the semantic analysis may also result into 
spurious large [5. 

Case V corresponds to negative values of both Xi and Xis. The same 
considerations of case III and case IV apply here. 

In case VI, detA = 0, the matrix is singular. Xi can not be determined. 


Table 2: HP3: Individual features estimation 


Case 

detA 

Xi 

^is 

^is 

Occurrences 

la 

detA 7 ^ 0 

Xi = Xi > t) 

^is ^is ^ 0 

Xi + Xis < 1 

191239 

Ib 

detA 7 ^ 0 

Xi = Xi = t) 

^is ^is 0 

Xi + Xis < 1 

117473 

Ic 

detA 7 ^ 0 

Xi = Xi > t) 

^is ^is 0 

Xi + Xis < 1 

3 

II 

detA 7 ^ 0 

Xi^ Xi>t) 

Xis ^ Xis > 0 

Xi + Xis > 1 

462 

III 

detA 7 ^ 0 

Xi < 0 —)■ Xi = 0 

0 < Xis < 1 

Xi + Xis < 1 

54694 

IVa 

detA 7 ^ 0 

Xi = Xi > 0 

Xis < 0 —)■ Xis = 0 

Xi + Xis < 1 

55621 

IVb 

detA 7 ^ 0 

Xi > 1 — )■ Xi = 1 

Xis < 0 —)■ Xis = 0 

Xi + Xis > 1 

463 

V 

detA 7 ^ 0 

Xi < 0 —)■ Xi = 0 

Xis < 0 —)■ Xis = 0 

Xi + Xis < 1 

335 

Via 

detA = 0 

Xi is undetermined 

0 < Xis = Xis < 1 

— 

3125 

VIb 

detA = 0 

Xi is undetermined 

^is 0 

— 

8440 

Vic 

detA = 0 

Xi is undetermined 

Xis < 0 —)■ Xis = 0 

— 

107 


detA is the discriminant of the eq. (35), Xi and Xsi are their solutions and 
Xi and Xis are the estimated features. Case I; Analytic best £t solutions are 
feasible. Case II: Members are positively influenced by both the trends and 
the neighbours but the analytic best fit solutions are unfeasible. Case III; 
Members are positively influenced by the trends but the analytic best £t 
solution Xi is negative. Case IV: Members are positively influenced by the 
neighbours but the analytic best fit solution Xis is negative. Case V: The 
analytic best £t solutions Xi and Xis are negative. Case VI: Members 
susceptibility by their neighbours are undetermined. 


The distributions of susceptibilities compatible with the constraints of 
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the diffusion theory are reported in the hgures 5 and 6. The average suscep¬ 
tibility under the HP^a hypothesis due to neighbours is 9.3%, whereas the 
contribution due to trends is 7.1% for a total average susceptibility of 16.4%. 
Roughly speaking, this means that about 85% of the subjects of publications 
are along the line of the previous works while some 15% do exhibit new top¬ 
ics due to the influence of collaborators and trends. The distribution prohles 
show a very pronounced peak at null susceptibility, while being smooth for 
other values. The existence of such peak may result from genuine effect, but 
it may also be an artifact of an insufficient semantic analysis. As a matter 
of fact, there is a large set of papers (1215200 out of 2246098) that can not 
be indexed by means of our selected set of topics. This is expected to result 
into spurious null susceptibilities. In order to test whether this is the case we 
have created a second (wider) set of topics imposing the constraint that all 
papers must be indexed. The resulting new set consists of 120917 topics. All 
papers were indexed; however, due to its size, the semantic analysis of this 
new set was not accurate enough to prevent undesired synonyms, mutually 
similar and even fake topics (the latter were, in fact, eliminated from the 
set of 7632 topics by human inspection). The computational time necessary 
to test the new wider set against the whole database is prohibitive for our 
present computational capabilities, therefore, we have limited the analysis to 
a reduced period of time (from 1985 to 1990). 


Table 3: Some statistics of trends (xis) and neighbours’ (xi ) susceptibilities employing 
two different sets of indexing topics (time period: 1985-1990) 


Indexing Set Size 

Xi < 0 

Xi = 0 

0 < Xi < 1 

Xi > 1 

7632 

15.01% 

43.59% 

41.05% 

0.35% 

120917 

21.60% 

13.43% 

64.46% 

0.51% 

Indexing Set Size 

^is ^ 0 

^is 0 

0 < Xi, < 1 

Xi, > 1 

7632 

4.40% 

38.10% 

57.49% 

0% 

120917 

3.19% 

3.14% 

93.67% 

0%. 


As can be seen from Table 3 the wider sample is capable to reduce sig- 
nihcantly the spurious null neighbours susceptibilities, while increasing the 
number of negatives. Such an increasing is worth attributing to the semantic 
redundancy and quality of the extracted topics. More precisely, similarities 
tend to provide extra topics to publications while they are actually on the 
line of previous work with old coauthors. Concerning the susceptibility to 
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trends, the extension of the set of basic topics results in a signihcant decrease 
of both the number of null values (as expected) and negatives. This is consis¬ 
tent with the previous interpretation, as similar topics are all represented in 
the trends and switching from one to another can not be read as contrasting 
the trends. 

It is evident that the semantic analysis represents one of the most critical 
points in our work. Further analyses (that are subject of ongoing work) will 
possibly provide a better set of basic topics with minimum similarity and 
maximum coverage. 

The distributions of susceptibilities achieved disregarding the constraints 
of the theory {HP^) are reported in the hgures 8 and 9. In this case, the 
average susceptibility to neighbours is 8.7%, while the contribution from 
trends is 5.9%, for a total average susceptibility of 14.6%. As a variation 
from the feasible solutions (Figs 5 and 6), one observes a signihcant decrease 
of null values of about 13% in both cases, corresponding to the negative 
solutions. 

Generally speaking, under the 77P3 hypothesis, the susceptibility to neigh¬ 
bours is higher than that to trends {x > Xg). However the two susceptibilities 
are anti-correlated (with a correlation coefficient of about r = —0.4); that 
is the total susceptibility huctuates less then each (neighbours of trends) 
component. Figure 7 shows the distribution of Xi and Xis frequencies as a 
three dimensional histogram. Most of the values lay in the feasible range 
(0 < Xj < 1 and 0 < Xjs < 1), yet several points lay outside. Moreover, 
consistently with the observed anti-correlation, there is a signihcant con¬ 
centration along the bisector of the second and forth quarters. This general 
considerations are stable against the variation of the set of basic topics, while 
details may be strongly inhuenced by some ambiguities in the dataset (see 
afterwords) and by the quality of semantic analysis. 

The authority coefficients (authorities, shortly) spans the [0,52] range; 
their mean value is a = 0.44 and its standard deviation is about twice that 
value (0.89). Figure 10 shows the histogram of authority distribution. As 
can be seen, there is a very long queue of few authors at high values. While 
there certainly exist real authors with some hundred collaborators, some of 
the observed ones may be hctitious. It is known that there exist different 
authors with the same name (given name and family name); those people 
are very often treated as a single author in several datasets. This problem is 
known as ” ambiguity” of the papers indexing; it results in gathering different 
authors into a single member of our social network. Due to the many to one 
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Figure 5: Histogram of neighbours’ fitted susceptibilities (feasible solutions only). 


transliteration of the original names in latin characters, Asian members are 
mostly prone to snch effect. The problem is known to affect DBLP data 
analysis [44]. In order to check this phenomenon, we have bnilt a list of 
freqnent Chinese fnll names by combining 50 very common Chinese given 
names [45] with 100 freqnent Chinese family names [46]. Fignre 10 shows a 
scatter plot of the anthorities as a fnnction of the nnmber of pnblications. 
Several high valnes of anthority correspond to entries associated with the 
constrncted set of freqnent Chinese fnll names. As it is clear from hgnre 11, 
the relationship between the anthorities and the nnmber of coanthors is not 
linear. 

Table 4 presents some individnal featnres of some famous authors. As 
expected they all exhibit high levels of authority. 

We have also tried to relate the success of an author with its authority. 
We have employed the number of published papers as an index of success, 
however a more appropriate index should be the total number of citations 
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Feasible Trend Susceptibility 


Figure 6: Histogram of trend’s fitted susceptibilities (feasible solutions only) 


[47, 48] which where not available. As shown in Figure 12, the higher is the 
success index the higher is the authority. Figure 13 presents the relationship 
between the success index and the trends susceptibility. In this case, there 
is not a clear dependency between the two variables as the same values of 
the success index correspond to different values of trends susceptibility. This 
means that there are successful authors of different types: some of them 
follow trends; some propose new topics and some continue working mostly 
on the same topics. 

Discussion 

This paper represents a hrst attempt to combine known methods in com¬ 
plexity science with the semantic analysis of natural language to provide 
insights in the propagation of people interests in social networks. As a first 
basic model we have assumed that interests propagate according to a dif- 





























Figure 7: Three dimensional histogram of the frequencies of the fitted hatXi and Xis- The 
majority of the values lay in the feasible region, yet significant negatives are observed. 

fusion mechanism, while being continuously created. The human behaviour 
is described by means of two basic behavioural characteristics (the suscepti¬ 
bility and the authority) that quantify the tendency to influence and being 
influenced by ’’friends” and the environment. 

The original ideas were developed having in mind the social networks 
and especially their instantiations on internet platforms. However the theory 
applies also to scientihc co-authorship networks. Since data for this type of 
networks were easily available we have tested the consequences of our model 
on one of them, the DBLP, that gathers a wide set of papers published in 
computer science. 

The preliminary results are very encouraging and, in fact, the diffusion 
mechanism, seems to be a leading part of the story. A single thread does 
propagate (like gossips) according to epidemic equations (6). However, in¬ 
terests in a held are more persistent in people and tend to be shared among 
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Neighbours Susceptibility 


Figure 8: Histogram of susceptibilities to neighbours. Feasibility constaints are released. 

’’friends”, thus leading to the diffusion mechanism. 

In this preliminary version of the model a lot of relevant factors have been 
purposely neglected. Among several others, it is worth stating some of them 
explicitly: 

The ageing of the links [49, 50]. In our model, once a link is established 
it is supposed to hold forever; whereas in reality links can also vanish for 
several reasons related to competitiveness, displacements, personal frictions 
etc. In general, we do not take into account the strengths of links and their 
evolution. 

The multiplex problem [51, 39, 52]. People have different types of rela¬ 
tionships and, hence, there exist multiple networks that may convey their 
interests; whereas our present model acquires information only from one 
source. This problem may also be framed in the contest of the hidden links 
research. 

The Semantic prohling [53, 54]. Generally speaking one is not allowed to 
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Trend Susceptibility 


Figure 9: Histogram of susceptibilities to trends. Feasibility constaints are released. 

assume that there exist a set of disjoint topics covering all possible interests. 
Concepts are normally overlapping and possibly one interest may induce an 
other one in a ’’close” topic. Typical examples come from music genders. To 
attribute a person a specihc interest in a gender such as jazz and not to blues 
is rather unreasonable. In the present model, in order to provide members 
with a semantic prohle, the existence of a basic set of disjoint topics is strictly 
required. Further developing will enter the semantic structure of the Domain 
of Interest and will lead to more complex modelling. 

Psychological types. In our model, we only allow people to be influenced 
by friends (or by the environment) or to be independent on them. However, 
there are several reasons for which a person may deliberately decide to do 
things, not just disregarding friends’ positions, but in contrast with them. 
This maybe for competition or just for spirit of independence. This type of 
behaviour is usually referred to as ’’anti conformity” and it has been studied 
elsewhere[55, 56, 57]. 
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Figure 10: Histogram of the authority coefficients. 

Beside the limits of the theory there are other factors that need further 
treatment and have possibly hindered the analysis of the DBLP; the semantic 
analysis and the disambiguation. 

Even assuming that there exist a basic set of disjoint topics covering the 
Domain of Interest, the semantic analysis may only lead to an approximation 
of it. Small sets of basic topics are not capable to index all papers, while larger 
ones tend to contain synonyms, similar multi-lexemes and, above all, concepts 
not representing interests (e.g. the words: report, surveys etc). As already 
discussed in the Section "HP^ Detailed Results”, in our experimentation an 
effect of the lack of coverage of the domain is the presence of null values of 
Xi and Xis- 

The member identihcation in the DBLP suffers from ambiguities. There 
are author names under which papers by different members are gathered 
(polysemy) and, viceversa, there are authors that sign different papers with 
slightly different names (synonymy). This issue is particularly relevant for 
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Figure 11: Scatter plot with the relationship between the number of the coauthors of an 
author and her / his authority. Scatter plot with the relationship between the number 
of the coauthors of an author and her / his authority filtering the Chinese most frequent 
names. Red crosses represent Chinese authors with a frequent full name whereas green 
crosses represent other authors. 


Asian names. The problem is currently approached by different authors with 
promising results [58], [59]. 

The second issue concerns the quality and the completeness [60] of the 
identified topics . 

The third issue concerns semantic profile estimation. Currently, interests 
are extracted from the titles of the papers by means of natural language tech¬ 
niques. A suitable wider set of sources will allow to detect a more complete 
set of topics and, hence, to define more accurate profiles. 

Despite the limits of the present paper, it demonstrates that diffusion of 
interest on social networks is a reality and it can be exploited to provide ’ad 
hoc’ services to their members. 
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Table 4: Famous authors in Computer Science 


Name 

Xi 

^is 

Authority a* 

Wil M. P. van der Aalst 

+0.111 

+0.058 

+12.809 

Jack Dongarra 

-0.019 

+0.028 

+10.259 

John Myloponlos 

+0.021 

+0.037 

+8.852 

Georg Gottlob 

+0.055 

+0.009 

+5.081 

Ian Horrocks 

+0.198 

-0.080 

+4.835 

Maurizio Lenzerini 

+0.106 

-0.065 

+3.128 

Erol Gelenbe 

+0.184 

+0.015 

+3.123 



Figure 12: Scatter plot of the authority of different semantically treatable authors versus 
their success index. 
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